azure-hunting/Azure Kubernetes Service Guided Hunting.ipynb

{ "cells": [ { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "# Azure Kubernetes Service (AKS) Hunting\n", "\n", "## Notebook Details\n", "**Python Version**: 3.8 \n", "\n", "**Platforms Supported**: Azure Machine Learning (AML) and Visual Studio Code \n", "\n", "**Data Sources Leveraged**:\n", "* Kubernetes control plane logging. To turn this on, head to your AKS cluster in the Azure Portal and find the **Diagnostic settings** entry in the left sidebar. Choose **+ Add diagnostic setting** and select the log categories you are interested in recording. \n", " * `kube-apiserver` logs keep track of all of the requests made to your Kubernetes API server. They’re great for spotting attackers peeking around or attempting to make changes.\n", " * `kube-audit` logs provide a time-ordered sequence of all of the actions taken in the cluster and are brilliant for security auditing. It is a superset of the information contained in the kube-apiserver logs and includes operations which are triggered inside the Kubernetes control plane. You can configure your audit policy using the audit.k8s.io/v1/Policy resource type by following the instructions [here](https://kubernetes.io/docs/tasks/debug/debug-cluster/audit/).\n", "* AKS VMSS `auditd` logging. Open Source projects like [aks-auditd](https://github.com/mang0kitty/aks-auditd) and Microsoft's [OMS Agent for Linux](https://github.com/microsoft/OMS-Agent-for-Linux) can help you set this up. \n", "\n", "## Description\n", "This notebook contains hunting hypotheses and queries you can use and expand upon to hunt for adversary activity on your Azure Kubernetes Service (AKS) cluster. \n", "\n", "## Contents\n", "* Prepare your notebook environment\n", "* Hunting Hypotheses and Queries\n", " * Tips for hunting\n", " * Initial Access\n", " * Command execution on your cluster's containers\n", " * Privilege Escalation\n", " * Deployment of privileged containers with the intention to container escape onto AKS worker nodes\n", " * Pivoting on high risk host volume path mounts\n", " * Execution\n", " * Container and worker node kernel activity (Syslog)\n", " * Stack counting all program execution on worker node and containers\n", " * Investigating potential anomalies in program and command line execution\n", " * Determine the distribution of program executions and arguments passed to it\n", " * Program execution associated with data exfiltration and program installations\n", " * Lateral Movement\n", " * Move laterally to cloud resources by requesting access token from IMDS server\n", " * Anomalous requests to the Kubernetes API Server\n", " * Persistence\n", " * Unusual Kubernetes objects being deployed\n", " * Container Image posioning\n", " * Baseline of container images and image registries" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## Prepare your environment\n", "This notebook uses `kqlmagic` and datasets within your Log Analytics workspace to support your AKS hunting. The following section installs `kqlmagic` and authenticates to your Log Analytics workspace." ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## Install Pre-requisites\n", "This section of the notebook installs `kqlmagic` which will be used in this notebook to execute native Kusto queries." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gather": { "logged": 1649336503065 }, "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "%%capture\n", "import sys\n", "!{sys.executable} -m pip install Kqlmagic --no-cache-dir --upgrade\n", "%reload_ext Kqlmagic" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## Connect to your Log Analytics Workspace\n", "The following cell connects you with the Log Analytics Workspace (LAW) within your Azure Tenant. You need to copy the code outputted by the cell and provide it in the [DeviceLogin site](https://microsoft.com/devicelogin).\n", "\n", "⚠️ Please update the `LOG_ANALYTICS_WORSPACE_ID` to specify the Workspace ID of your Log Analytics Workspace. Your LAW ID can be found as described [here](https://docs.microsoft.com/bonsai/cookbook/get-law-id). " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gather": { "logged": 1649336669317 }, "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "LOG_ANALYTICS_WORSPACE_ID = \"YOUR LOG ANALYTICS WORKSPACE ID\" \n", "%kql logAnalytics://code;workspace=LOG_ANALYTICS_WORSPACE_ID;" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## Hunting Hypotheses and Queries\n", "When hunting for an adversary, the goal is not to enumerate every tactic, technique and procedure they can do. The goal is to find and look at the specific junctions an adversary would need to cross to execute a successful attack. To do this, we have provided you with some hunting queries you can use as signals to indicate something interesting is happening as well as some additional queries and context to help you dive deeper.\n", "\n", "⚠️ All these queries serve as starting points for your hunting and investigations. They contain variables that can be expanded on and tweaked to be more applicable to your own environment!" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## Tips for Hunting\n", "* When creating your hunt hypotheses, think about what actions an adversary would need to perform to attack your workloads.\n", "* Pay attention to the `User Agent` used to make requests to Azure Resource Manager (ARM), and in this case, the Kubernetes API. For example, for production workloads, it might be unusual for you to interact with your cluster using `kubectl` or the `Azure CLI`. If you see this activity, it is a signal that someone is performing hands-on-keyboard activity on your cluster or subscription. " ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## Initial access\n", "### Command execution on your cluster's containers\n", "**Hunt Hypothesis**\n", "> Before an attacker can execute commands on your pod, to look for secrets or escape onto the underlying host, they must first `exec` into the container\n", "\n", "This hypothesis allows us to look for an adversary at a key juncture of their attack. Using `kubectl exec` to execute commands on your container is advantageous to an attacker: \n", "1. The commands being run aren’t always logged and as visible as commands specified in the container image. \n", "2. Enables them to access the service account tokens for that pod. By default, every pod has a service account mounted whose permission is determined by role bindings. In a production cluster, even on a worker node, there is usually at least one pod that has a mounted token that is bound to a service account that is bound to a clusterrolebinding, which gives you access to do things like create pods or view secrets in all namespaces.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "# Looks for container command execution into pods. We filter out the command executions by the aksProblemDetector user into the tunnel-front Pod, this is known good an regular activity.\n", "%%kql\n", "AzureDiagnostics\n", "| extend log_s=parse_json(log_s)\n", "| extend verb = tostring(log_s[\"verb\"])\n", "| extend objectRef = log_s[\"objectRef\"]\n", "| extend username = tostring(log_s[\"user\"][\"username\"])\n", "| extend userAgent = tostring(log_s[\"userAgent\"])\n", "| extend requestURI = tostring(log_s[\"requestURI\"])\n", "| extend resource = tostring(objectRef[\"resource\"])\n", "| where verb == \"create\" \n", "| where requestURI contains \"/exec\"\n", "| where username != \"aksProblemDetector\" and requestURI !endswith \"/exec?command=ls&container=tunnel-front&stderr=true&stdout=true&timeout=20s\"\n", "| summarize TimeStamps=make_set(TimeGenerated) by verb, resource, PodName=pod_s, requestURI, username, userAgent, SubscriptionId, tostring(objectRef)\n" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "\n", "## Privilege Escalation\n", "\n", "### Container Escape\n", "By default, a container is isolated from the host system's network and memory address space by using the Linux Kernel's [`cgroups`](https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/cgroups.html) and namespace features. If a pod is \"privileged\", it's containers are essentially running without these isolation constructs which gives the container nearly all the same access as processes running on the host. \n", "\n", "This gives an attacker a number of advantages:\n", "\n", "1. Access to secrets on the underlying worker node: \n", "- user account secrets placed by `kubeadm` in `etc/kubernetes`. Most other certificated are stored in `/etc/kubernetes/pki` \n", "- `/etc/kubernetes/azure.json` on host worker node which contains service principal that has access ( by default Contributor) to all resources in the `MC_` resource group. \n", "- Access the `kubeconfig` file on the worker VM which contains the kubelet's service account token. This service account token has permissions to request all the cluster's secrets (depending on your RBAC configuration). \n", "- Secrets in `tmpfs` - those stored in memory on worker node \n", "\n", "2. Allows an attacker to run applications directly on the host. This gives an adversary a stealthy backdoor to your cluster.\n", "\n", "There are two primary methods for performing a container escape:\n", "\n", "1. Mount the host file system and escalate privileges to get full shell on the node. An attacker can do this by deploying a pod with one or more of the following privileged configurations:\n", "* The pod's `securityContext` set to `privileged`.\n", "* A privileged `hostPath` Mount\n", "* [An exposed docker socket](https://blog.quarkslab.com/why-is-exposing-the-docker-socket-a-really-bad-idea.html)\n", "* Expose the host process ID namespace by setting `hostPid` to `True` in the pod's security context.\n", "\n", "\n", "2. Exploit `cgroups` to get interactive root access on the node. A pre-requisite for this attack is to `exec` into the container itself, which the above hunting hypothesis should find. Read this [blog post](https://blog.trailofbits.com/2019/07/19/understanding-docker-container-escapes/) for an example of a container escape exploiting Linux `cgroups` v1 `notify_on_release` feature. \n", "\n", "**Hunt Hypothesis**\n", "> An attacker looking for container escape will deploy a privileged container or modify an existing pod's configuration to give them elevated access to the host's process and network address space\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gather": { "logged": 1649430122752 }, "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } }, "vscode": { "languageId": "kusto" } }, "outputs": [], "source": [ "%%kql\n", "// Hunting query to find deployment of priviledged pods \n", "let lookbackStart = ago(50d);\n", "let lookbackEnd = now();\n", "let timeStep = 1d;\n", "AzureDiagnostics\n", "// Filter to time range you want to examine\n", "| where TimeGenerated between(lookbackStart..lookbackEnd)\n", "| extend log_s=parse_json(log_s)\n", "| extend verb = tostring(log_s[\"verb\"])\n", "| extend objectRef = log_s[\"objectRef\"]\n", "| extend requestURI = tostring(log_s[\"requestURI\"])\n", "| extend resource = tostring(objectRef[\"resource\"])\n", "| where verb == \"create\"\n", "| where requestURI !contains \"/exec\"\n", "| where resource == \"pods\"\n", "| extend requestObject = log_s[\"requestObject\"]\n", "| extend spec = requestObject[\"spec\"]\n", "| extend containers = spec[\"containers\"][0]\n", "| extend username = tostring(log_s[\"user\"][\"username\"])\n", "| extend userAgent = tostring(log_s[\"userAgent\"])\n", "| project\n", " TimeGenerated,\n", " containerName=tostring(containers[\"name\"]),\n", " containerImage=tostring(containers[\"image\"]), \n", " securityContext=tostring(containers[\"securityContext\"]), \n", " volumeMounts=tostring(containers[\"volumeMounts\"]), \n", " namespace=tostring(objectRef[\"namespace\"]),\n", " username,\n", " userAgent, \n", " containers, \n", " requestObject, \n", " objectRef, \n", " spec\n", "| where isnotempty(securityContext) \n", "// Filtering for cases where the coontainer has a priviledged security context or host process namespace is exposed\n", "| where parse_json(todynamic(securityContext)[\"privileged\"]) == \"true\" or parse_json(todynamic(spec)[\"hostPID\"]) == \"true\" \n", "| summarize Count=count() by bin(TimeGenerated, timeStep), containerImage, namespace, containerName\n", "| render timechart \n", "\n" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ " ### Pivoting on high risk host volume path mounts\n", "This query identifies containers that have been deployed to your cluster that are configured in such a way that exposes the undelying worker node's file system. This is a well-known configuration that enables container escape.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } }, "vscode": { "languageId": "kusto" } }, "outputs": [], "source": [ "%%kql\n", "let highRiskHostVolumePaths = datatable (path: string) [\n", " \"/\",\n", " \"/var/log\",\n", " \"/var/run/docker.sock\"\n", "];\n", "let _startLookBack = ago(1d);\n", "let _endLookBack = now();\n", "AzureDiagnostics\n", "| where TimeGenerated between (_startLookBack.._endLookBack)\n", "| extend log_s=parse_json(log_s)\n", "| extend verb = tostring(log_s[\"verb\"])\n", "| extend objectRef = log_s[\"objectRef\"]\n", "| extend requestURI = tostring(log_s[\"requestURI\"])\n", "| extend resource = tostring(objectRef[\"resource\"])\n", "| where verb == \"create\"\n", "| where requestURI !contains \"/exec\"\n", "| where resource == \"pods\"\n", "| extend spec = log_s[\"requestObject\"][\"spec\"]\n", "| extend containers = spec[\"containers\"][0]\n", "| extend hostVolumeMounts=spec[\"volumes\"]\n", "| where isnotempty(hostVolumeMounts) \n", "| mv-expand hostVolumeMount=hostVolumeMounts\n", "| extend hostVolumeName = hostVolumeMount[\"name\"], hostPath=hostVolumeMount[\"hostPath\"]\n", "| extend hostPathName=hostPath[\"path\"], hostPathType=hostPath[\"type\"]\n", "| where isnotempty(hostPathName)\n", "| where hostPathName has_any(highRiskHostVolumePaths) \n", "| project \n", " TimeGenerated, \n", " podName=tostring(objectRef[\"name\"]), \n", " containerName=tostring(containers[\"name\"]), \n", " containerImage=tostring(containers[\"image\"]), \n", " namespace=tostring(objectRef[\"namespace\"]), \n", " hostVolumeName, \n", " hostPathName, \n", " hostPathType, \n", " securityContext=tostring(containers[\"securityContext\"]), \n", " volumeMounts=containers[\"volumeMounts\"]" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## Execution\n", "### Container and worker node kernel activity (Syslog)\n", "The following queries require you to enable `auditd` logging on your AKS worker node VMSS. Open Source projects like [aks-auditd](https://github.com/mang0kitty/aks-auditd) and Microsoft's [OMS Agent for Linux](https://github.com/microsoft/OMS-Agent-for-Linux) can help you set this up. \n", "\n", "`auditd` logging provides you with an easy and highly configurable way to gain visibility into your AKS worker node and container kernel level activity. If you are running a multi-tenant cluster, having visibility into your AKS worker node activity is critical and the Kubernetes API server logs aren't always be enough. auditd provides a good solution to this.\n", "\n", "In this case, we are going to use this audit logging to view syscall activity, primarily program executions.\n", "\n", "**Hunting Hypothesis**\n", "> The activity of an attacker with the ability to execute commands on a AKS container or VMSS worker node will different from the baseline activity of your AKS cluster." ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "#### Stack counting all program execution on worker node and containers\n", "This query stack counts all the programs started following a syscall call to `execve` (59). This gives us a high level overview of the processes running on our cluster.\n", "\n", "Some interesting program executions to look out for include the following:\n", "* `azcopy` and `tar` - Can be used to exfiltrate credentials to attacker owned blob storage\n", "* `curl` and `wget` - Can be used to install executables on the host\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } }, "vscode": { "languageId": "kusto" } }, "outputs": [], "source": [ "%%kql\n", "let _startTime = ago(30d);\n", "let _endTime = now();\n", "Syslog\n", "| where TimeGenerated between(_startTime.._endTime)\n", "| where Facility == \"authpriv\"\n", "| where ProcessName == \"audispd\"\n", "| parse SyslogMessage with * \"type=\" type \" msg=audit(\" EventID \"): \" info\n", "| extend KeyValuePairs = array_concat(\n", " extract_all(@\"([\\w\\d]+)=([^ ]+)\", info),\n", " extract_all(@\"([\\w\\d]+)=\"\"([^\"\"]+)\"\"\", info))\n", "| mv-apply KeyValuePairs on \n", "(\n", " extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))\n", " | summarize Info=make_bag(p)\n", ")\n", "| summarize arg_min(TimeGenerated, HostName), EventInfo=make_bag(pack(type, Info)) by EventID\n", "| where EventInfo[\"SYSCALL\"][\"syscall\"] == \"59\"\n", "| summarize Total=count(), FirstSeen=min(TimeGenerated), LastSeen=max(TimeGenerated) by tostring(EventInfo[\"SYSCALL\"][\"exe\"])\n", "| order by Total asc" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "#### Investigating potential anomalies in program and command line execution\n", "\n", "The following query isolates program execution and the command line arguments passed to the program. We can use this to find anomalous programs executed by the host and any deviations from how this program is normally used. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } }, "vscode": { "languageId": "kusto" } }, "outputs": [], "source": [ "%%kql\n", "let _startTime = ago(30d);\n", "let _endTime = now();\n", "let TimeSeriesAnomalies = Syslog\n", "| where TimeGenerated between(_startTime.._endTime)\n", "| where Facility == \"authpriv\"\n", "| where ProcessName == \"audispd\"\n", "| parse SyslogMessage with * \"type=\" type \" msg=audit(\" EventID \"): \" info\n", "| extend KeyValuePairs = array_concat(\n", " extract_all(@\"([\\w\\d]+)=([^ ]+)\", info),\n", " extract_all(@\"([\\w\\d]+)=\"\"([^\"\"]+)\"\"\", info))\n", "| mv-apply KeyValuePairs on \n", "(\n", " extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))\n", " | summarize Info=make_bag(p)\n", ")\n", "| summarize arg_min(TimeGenerated, HostName), EventInfo=make_bag(pack(type, Info)) by EventID\n", "| where EventInfo[\"SYSCALL\"][\"syscall\"] == \"59\"\n", "| extend Exe = tostring(EventInfo[\"SYSCALL\"][\"exe\"])\n", "| extend CommandLine = EventInfo[\"EXECVE\"]\n", "| mv-apply CommandLine on ( \n", " extend key = tostring(bag_keys(CommandLine)[0])\n", " | where key matches regex @\"^a\\d+$\"\n", " | parse CommandLine[key] with '\"' Argument '\"'\n", " | project Argument = iff(indexof(Argument, \" \") >= 0, CommandLine[key], Argument)\n", " | summarize CommandLine = make_list(Argument, 50)\n", " | extend CommandLine = strcat_array(CommandLine, \" \")\n", ")\n", "|make-series Total=count() on TimeGenerated from _startTime to _endTime step 1h by Exe, CommandLine\n", "| extend (anomalies, score, baseline) = series_decompose_anomalies(Total, 1.5, -1, 'linefit')\n", "| mv-expand Total to typeof(double), TimeGenerated to typeof(datetime), anomalies to typeof(double),score to typeof(double), baseline to typeof(long);\n", "TimeSeriesAnomalies" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "#### Determine the distribution of program executions and arguments passed to it\n", "In this scenario, we are using `basket` to find the patterns in what programs are executed and how they are used (command line arguments). In this query, we are ordering the output to show us the most infrequent program executions across your AKS cluster.\n", "\n", "You can also pipe the output of the above program execution time series anomaly query to `basket` in order to enrich it with information on how frequent that pattern (executable name and command line arguments) was found." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } }, "vscode": { "languageId": "kusto" } }, "outputs": [], "source": [ "%%kql\n", "let _startTime = ago(30d);\n", "let _endTime = now();\n", "Syslog\n", "| where TimeGenerated between(_startTime.._endTime)\n", "| where Facility == \"user\"\n", "| where ProcessName == \"audispd\"\n", "| parse SyslogMessage with * \"type=\" type \" msg=audit(\" EventID \"): \" info\n", "| extend KeyValuePairs = array_concat(\n", " extract_all(@\"([\\w\\d]+)=([^ ]+)\", info),\n", " extract_all(@\"([\\w\\d]+)=\"\"([^\"\"]+)\"\"\", info))\n", "| mv-apply KeyValuePairs on \n", "(\n", " extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))\n", " | summarize Info=make_bag(p)\n", ")\n", "| summarize arg_min(TimeGenerated, HostName), EventInfo=make_bag(pack(type, Info)) by EventID\n", "| where EventInfo[\"SYSCALL\"][\"syscall\"] == \"59\"\n", "| extend Exe = tostring(EventInfo[\"SYSCALL\"][\"exe\"])\n", "| extend CommandLine = EventInfo[\"EXECVE\"]\n", "| mv-apply CommandLine on ( \n", " extend key = tostring(bag_keys(CommandLine)[0])\n", " | where key matches regex @\"^a\\d+$\"\n", " | parse CommandLine[key] with '\"' Argument '\"'\n", " | project Argument = iff(indexof(Argument, \" \") >= 0, CommandLine[key], Argument)\n", " | summarize CommandLine = make_list(Argument, 50)\n", " | extend CommandLine = strcat_array(CommandLine, \" \")\n", ")\n", "| project HostName, Exe, CommandLine\n", "| evaluate basket(0.001)\n", "| order by Percent asc" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "This query identifies commands that are known to be used by adversaries once they have compromised a host. \n", "**Hunt Hypothesis**\n", "> An attacker will use well known programs on a pod or AKS VMSS worker node to exfiltrate data or expand their foothold on the cluster\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } }, "vscode": { "languageId": "kusto" } }, "outputs": [], "source": [ "%%kql\n", "let riskyCommands = datatable(command: string)[\n", " \"azcopy\",\n", " \"wget\",\n", " \"tar\",\n", " \"curl\"\n", "];\n", "Syslog\n", "| where Facility == \"authpriv\"\n", "| where ProcessName == \"audispd\"\n", "| parse SyslogMessage with * \"type=\" type \" msg=audit(\" EventID \"): \" info\n", "| extend KeyValuePairs = array_concat(\n", " extract_all(@\"([\\w\\d]+)=([^ ]+)\", info),\n", " extract_all(@\"([\\w\\d]+)=\"\"([^\"\"]+)\"\"\", info))\n", "| mv-apply KeyValuePairs on \n", "(\n", " extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))\n", " | summarize Info=make_bag(p)\n", ")\n", "| summarize arg_min(TimeGenerated, HostName), EventInfo=make_bag(pack(type, Info)) by EventID\n", "| where EventInfo[\"PATH\"][\"name\"] has \"curl\"\n", "| mv-expand bagexpansion=array Arg=EventInfo[\"EXECVE\"]\n", "| summarize arg_min(TimeGenerated, HostName, EventInfo), Args=strcat_array(make_list_if(Arg[1], Arg[0] != \"argc\", 20), \" \") by EventID\n", "| where Args has_any(riskyCommands)\n", "| project EventID, TimeGenerated, HostName, Args, CurrentWorkingDirectory=tostring(EventInfo[\"CWD\"][\"cwd\"]), Path=tostring(EventInfo[\"PATH\"][\"name\"])\n", "| summarize TotalRequests=count(), FirstSeen=min(TimeGenerated), LastSeen=max(TimeGenerated) by HostName, Args" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## Lateral Movement\n", "### Move laterally to cloud resources by calling IMDS server\n", "Attacker might move laterally to other resources in your Azure subscription by using `curl` to retrieve Managed Service Identity (MSI) access tokens from the IMDS service. The following API requests to the IMDS service are worth investigating if you see them on your cluster:\n", "* Request for metadata on the VM\n", "* [Request for access tokens](https://docs.microsoft.com/azure/active-directory/managed-identities-azure-resources/how-to-use-vm-token)\n", "\n", "**Hunt Hypothesis**\n", "> An attacker looking to move laterally to cloud resources accessible to the cluster will make a request to the IMDS server to retrieve MSIs attached to the underlying AKS worker nodes\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } }, "vscode": { "languageId": "kusto" } }, "outputs": [], "source": [ "%%kql\n", "Syslog\n", "| where Facility == \"authpriv\"\n", "| where ProcessName == \"audispd\"\n", "| parse SyslogMessage with * \"type=\" type \" msg=audit(\" EventID \"): \" info\n", "| extend KeyValuePairs = array_concat(\n", " extract_all(@\"([\\w\\d]+)=([^ ]+)\", info),\n", " extract_all(@\"([\\w\\d]+)=\"\"([^\"\"]+)\"\"\", info))\n", "| mv-apply KeyValuePairs on \n", "(\n", " extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))\n", " | summarize Info=make_bag(p)\n", ")\n", "| summarize arg_min(TimeGenerated, HostName), EventInfo=make_bag(pack(type, Info)) by EventID\n", "| where EventInfo[\"PATH\"][\"name\"] has \"curl\"\n", "| mv-expand bagexpansion=array Arg=EventInfo[\"EXECVE\"]\n", "| where Arg[0] != \"argc\"\n", "| where Arg[1] contains \"169.254.169.254\"\n", "| extend IMDSURL = tostring(Arg[1])\n", "| summarize TotalRequests=count(), FirstSeen=min(TimeGenerated), LastSeen=max(TimeGenerated), Hosts=makeset(HostName) by IMDSURL" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } }, "vscode": { "languageId": "kusto" } }, "outputs": [], "source": [ "%%kql\n", "Syslog\n", "| where Facility == \"authpriv\"\n", "| where ProcessName == \"audispd\"\n", "| parse SyslogMessage with * \"type=\" type \" msg=audit(\" EventID \"): \" info\n", "| extend KeyValuePairs = array_concat(\n", " extract_all(@\"([\\w\\d]+)=([^ ]+)\", info),\n", " extract_all(@\"([\\w\\d]+)=\"\"([^\"\"]+)\"\"\", info))\n", "| mv-apply KeyValuePairs on \n", "(\n", " extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))\n", " | summarize Info=make_bag(p)\n", ")\n", "| summarize arg_min(TimeGenerated, HostName), EventInfo=make_bag(pack(type, Info)) by EventID\n", "| where EventInfo[\"PATH\"][\"name\"] has \"curl\"\n", "| mv-expand bagexpansion=array Arg=EventInfo[\"EXECVE\"]\n", "| summarize arg_min(TimeGenerated, HostName, EventInfo), Args=strcat_array(make_list_if(Arg[1], Arg[0] != \"argc\", 20), \" \") by EventID\n", "| where Args contains \"http://169.254.169.254/metadata/identity/oauth2/token\"\n", "| project EventID, TimeGenerated, HostName, Args, CurrentWorkingDirectory=tostring(EventInfo[\"CWD\"][\"cwd\"]), Path=tostring(EventInfo[\"PATH\"][\"name\"])\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### Anomalous requests to the K8s API Server\n", "If an attacker has gained access to your cluster and is trying to escalate their privildges and move laterally, it is likely that they will make API requests to the Kubernetes API server that are unusual for that cluster. These requests could include requests to execute commands into the cluster or create new roles and role bindings.\n", "\n", "\n", "**Hunt Hypothesis**\n", "> The request patterns of an attacker with the ability to communicate with your AKS cluster Kubernetes API server will differ from the baseline requests made by the production workloads running on your cluster.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } }, "vscode": { "languageId": "kusto" } }, "outputs": [], "source": [ "%%kql\n", "let _startTime = ago(30d);\n", "let _endTime = ago(1d);\n", "let _timestep = 1h;\n", "let _totalEventThreshold = 5;\n", "let DiagnosticEvents = AzureDiagnostics\n", "| where TimeGenerated between (_startTime.._endTime)\n", "| extend TimeBucket = bin(TimeGenerated, _timestep)\n", "| extend log_s = parse_json(log_s)\n", "| extend verb = tostring(log_s[\"verb\"])\n", "| extend objectRef = log_s[\"objectRef\"]\n", "| extend resourceName = tostring(objectRef[\"name\"])\n", "| extend requestURI = tostring(log_s[\"requestURI\"])\n", "| extend resource = tostring(objectRef[\"resource\"])\n", "| extend username = tostring(log_s[\"user\"][\"username\"])\n", "| extend userAgent = tostring(log_s[\"userAgent\"])\n", "| where isnotempty(resourceName)\n", "| where isnotempty(username);\n", "let K8sAPIRequestAnomalies = DiagnosticEvents\n", "| make-series Total = count() on TimeBucket from bin(ago(_startTime), _timestep) to bin(ago(_endTime), _timestep)+_timestep step _timestep by verb, resource, username, userAgent\n", "// More documentation on the series_decompose_anomalies Kusto function can be found here https://docs.microsoft.com/azure/data-explorer/kusto/query/series-decompose-anomaliesfunction\n", "| extend (anomalyFlag, anomalyScore, expectedValue) = series_decompose_anomalies(Total, 5, -1, 'linefit', 0, \"ctukey\")\n", "| mv-expand Total to typeof(double), TimeBucket to typeof(datetime), anomalyFlag to typeof(double), anomalyScore to typeof(double), expectedValue to typeof(double)\n", "| where anomalyFlag > 0 \n", "| where Total > _totalEventThreshold\n", "| order by anomalyScore desc;\n", "K8sAPIRequestAnomalies\n", "| lookup kind=inner DiagnosticEvents on TimeBucket, verb, resource, username, userAgent\n", "//| project TimeGenerated, verb, resource, resourceName, username, requestURI, userAgent, Total, expectedValue, anomalyFlag, anomalyScore\n", "| summarize Events=count(), RequestURIs=make_set(requestURI), resourceNames=make_set(resourceName), expectedValue=any(expectedValue), anomalyScore=any(anomalyScore) by TimeBucket, verb, resource, username, userAgent" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## Persistence\n", "### Unusual Kubernetes objects being deployed\n", "Kubernetes is composed of different entities called [objects](https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/). Attackers may deploy different types of objects that are not normally present on your cluster. For example, an attacker might deploy Kubernetes objects like `DaemonSet`s and `Deployment`s that allow an attacker to affect all pods on your cluster, in contrast to a `Pod` object which only allows an attacker to control a single pod. \n", "\n", "Additionally, objects like `Daemonsets` and `Deployments` allow an attacker to bypass some of the configuration change restrictions that prevent someone updating the state of a Pod. \n", "**Hunt Hypothesis**\n", "> An attacker looking for persistence on your cluster will deploy Kubernetes objects like `DamemonSet`s and `Deployment`s to get a foothold on all pods on your cluster.\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "#### Stack Counting Kubernetes Objects that are deployed\n", "This query is a simple stack count of the different types of Kubernetes objects deployed in your cluster. It will likely be clear from the line chart displayed which objects are regularly created as part of your cluster's operation. From here, you can then remove these objects from the graph until you zoom into the most irregularly deployed or \"spikey\" Kubernetes objects. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } }, "vscode": { "languageId": "kusto" } }, "outputs": [], "source": [ "%%kql\n", "// Ignore resource creations that are unlikely to be used by an adversary for persistence\n", "let ignoredResources = datatable(type:string)[\n", "\"tokenreviews\",\n", "\"events\",\n", "\"subjectaccessreviews\",\n", "\"selfsubjectaccessreviews\",\n", "\"storageclasses\"\n", "];\n", "let _startLookBack = ago(50d);\n", "let _endLookBack = now();\n", "let _stepTime = 1h;\n", "AzureDiagnostics\n", "| where TimeGenerated between(_startLookBack.._endLookBack)\n", "| extend log_s=parse_json(log_s)\n", "| extend verb = tostring(log_s[\"verb\"])\n", "| extend objectRef = log_s[\"objectRef\"]\n", "| extend requestURI = tostring(log_s[\"requestURI\"])\n", "| extend resource = tostring(objectRef[\"resource\"])\n", "| extend name=tostring(objectRef[\"name\"])\n", "| where verb == \"create\"\n", "| where resource !in(ignoredResources)\n", "| summarize Count=count() by bin(TimeGenerated, _stepTime),resource\n", "| render timechart\n" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### Container image posioning \n", "An attcker with access to your container image registry credentials might update an existing container image to give them a backdoor. If you are not using signed Docker images, this is trivially easy for an attacker to do.\n", "\n", "One way to identify an image posioning attack is to look for a new image being pushed to Azure Container Regisrty that has **two SHA hashes corresponding to a single image version**. Depending on how your cluster's pods are configured to pull images, it might be interesting to look for unexpected restarts of pods, following a new image being pushed to your container registry. To identify this activity, you will need to enable specific logging on your ACR as described [here](https://docs.microsoft.com/azure/container-registry/monitor-service). You can then use the \n", "`ContainerRegistryRepositoryEvents` table to find rogue images being pushed. \n", "\n", "**Hunting Hypothesis**\n", "> An attacker will update the contents of an existing image without updating the image version, to align with the pod's existing configuration. \n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### Baseline of container images and image registries\n", "Before diving too deep into the behaviour of your cluster, it is useful to run a quick baseline of the container images and container registries being leveraged by your cluster. This helps build your mental model of what containerized workloads are running on your cluster.\n", "\n", "While doing this, it's worth noting that certain container images are more valuable to an attacker than others. These include container images like `busybox`, `alpine`, `ubuntu` and specialized images with offensive security images installed. If your production Kubernetes workloads are mostly comprised of custom container images, pulled from your private container registry, unexpected container images and/or container registries easily stand out.\n", "\n", "The following query is a simple baseline of the different container images that have been deployed on your cluster, as well as the container registries they have been pulled from. You can use this as a starting point for looking for any container images that may have been pushed to the cluster by an adversary. \n", "\n", "**Hunt Hypothesis**\n", "> An adversary will deploy container images that allow them to install tooling and expand their reach on the cluster. Attack deployed containers might use a different container registry or image to those normally used by your containerized workloads." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } }, "vscode": { "languageId": "kusto" } }, "outputs": [], "source": [ "%%kql \n", "// You can include your private container registry server here if you want to exclude image pulled from there\n", "let _trustedContainerRegistries = datatable ( registry: string)[\n", "\"mcr.microsoft.com\",\n", "];\n", "let _startLookBack = ago(7d);\n", "let _endLookBack = now();\n", "let _timeStep = 1d;\n", "AzureDiagnostics\n", "| extend log_s=parse_json(log_s)\n", "| extend verb = tostring(log_s[\"verb\"])\n", "| extend objectRef = log_s[\"objectRef\"]\n", "| extend requestURI = tostring(log_s[\"requestURI\"])\n", "| extend resource = tostring(objectRef[\"resource\"])\n", "| where verb == \"create\"\n", "| where requestURI !contains \"/exec\"\n", "| where resource == \"pods\"\n", "| extend requestObject = log_s[\"requestObject\"]\n", "| extend spec = requestObject[\"spec\"]\n", "| extend containers = spec[\"containers\"][0]\n", "// Additional fields are included here if you want more context in a table output, rather than a timechart\n", "| project\n", " TimeGenerated,\n", " containerName=tostring(containers[\"name\"]),\n", " containerImage=tostring(containers[\"image\"]), \n", " securityContext=tostring(containers[\"securityContext\"]), \n", " volumeMounts=tostring(containers[\"volumeMounts\"]), \n", " namespace=tostring(objectRef[\"namespace\"]), \n", " containers, \n", " objectRef \n", "| where isnotempty(containerImage)\n", "| where not(containerImage has_any(_trustedContainerRegistries))\n", "| make-series Count=count() on TimeGenerated from _startLookBack to _endLookBack step _timeStep by containerImage\n", "| render timechart \n" ] } ], "metadata": { "kernel_info": { "name": "azureml_py38_pt_tf" }, "kernelspec": { "display_name": "Python 3.8 - Pytorch and Tensorflow", "language": "python", "name": "azureml_py38_pt_tf" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.1" }, "microsoft": { "host": { "AzureML": { "notebookHasBeenCompleted": true } } }, "nteract": { "version": "nteract-front-end@1.0.0" } }, "nbformat": 4, "nbformat_minor": 0 }

azure-hunting/Azure Kubernetes Service Guided Hunting.ipynb (1,071 lines of code) (raw):